Goto

Collaborating Authors

 building block




78ed45281dd746a265fff16ff75a02e5-Paper-Conference.pdf

Neural Information Processing Systems

Unfortunately, these theoretical results cannot well explain the empirical successes of deep learning well, as they require the model size tobenolargerthan O(n)(thegeneralization boundsbecomevacuousotherwise).


AsCAN: AsymmetricConvolution-AttentionNetworks forEfficientRecognitionandGeneration

Neural Information Processing Systems

Tosatisfy that, architectures must provide promising latency and performance trade-offs, support a variety of tasks, scale efficiently with respect to the amounts of data and compute, leverage available data from other tasks, and efficiently support various hardware.